[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional by cad-rlc · Pull Request #19111 · pytorch/executorch

cad-rlc · 2026-04-24T15:18:38Z

Summary

Optimized Cadence Vision 130 DSP operators for ResNet-18 and ResNet-50 inference. All operators are DMA-enabled with ping-pong tiling based on available DRAM size, and fall back to cache mode when no DRAM or insufficient DRAM is available for a given kernel's usage. All operators are functionally verified (int8 quantized, NCHW layout).

Operators

Conv2d (`quantized_conv2d_nchw`)

Kernel variants: 7x7j2, 3x3j1, 3x3j2, 1x1j1, 1x1j2
Modes: DMA ping-pong tiling (with iDMA) and cache-only (no DMA)
Dispatch: Automatic kernel selection based on layer configuration (kernel size, stride, dilation)
Quantization: int8 asymmetric input × symmetric weights, per-tensor output scaling
Bias correction: 24-bit clamped kernel bias with post-kernel residual correction
Config generator: Python tool to generate per-DRAM-size layer configuration headers

MaxPool2d (`maxpool_exec_mxnj2`)

Kernel: Arbitrary M×N kernel size, stride-2
Modes: DMA tiled and cache-only (no DMA)
Layout: NCHW float32

Mean / AdaptiveAvgPool (`mean_exec_dma`)

Kernel: SIMD-optimized channel-wise mean with DMA tiling
Layout: NCHW float32, reduces spatial dimensions to 1×1

Quantize / Dequantize (`quantize_per_tensor`, `dequantize_per_tensor`)

Modes: DMA ping-pong and hardware-optimized (no DMA)
Types: int8 asymmetric (asym8s)

Quantized ReLU (`quantized_relu`)

Modes: DMA ping-pong and hardware-optimized (no DMA)
Type: int8 clamp

Quantized Linear (`quantized_linear_out`)

Mode: SIMD execution with DMA tiling
Type: int8 input × int8 weights, int32 bias

Add (`op_add`)

Mode: DMA ping-pong element-wise float32 addition

Softmax (`op_softmax`)

Mode: Hardware-optimized softmax

Build Configuration

Supports configurable DRAM buffer sizes
Automatic dispatch between DMA and cache-only modes based on DRAM availability

Test Configuration

Vision 130 DSP core configuration used for testing:

Data Cache size: 65536
Data Cache ways: 2
Data Cache line size: 128
Data Cache write-back: Enabled
Number of Data Cache banks: 2
Instruction Cache size: 65536
Instruction Cache ways: 4
Instruction Cache line size: 256
Data RAM 0: 4K / 8K / 16K / 24K / 32K / 64K
Data RAM 1: 4K / 8K / 16K / 24K / 32K / 64K
Instruction RAM: 0

Performance Results

We observed approximately 45× and 55× performance improvements for complete inference of ResNet-18 and ResNet-50, respectively, with optimized operators compared to generic operators, when using memory modeling: --mem_model --mlatency=40 --blockrepeat=1 --write_delay=40 --write_repeat=1 and 64K DRAM0 and DRAM1.

cc @mcremon-meta
@hsharma35
@zonglinpengmeta

pytorch-bot · 2026-04-24T15:18:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 11 Awaiting Approval

As of commit 0cccddf with merge base ec31735 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-04-24T15:18:57Z

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

ciflow/trunk

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

github-actions · 2026-04-24T15:19:32Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cad-rlc · 2026-05-08T10:33:42Z

@mcremon-meta @hsharma35 @zonglinpengmeta
This is the final PR for the ResNet18 and ResNet50 models.

mcremon-meta

Will continue the review later, but can we clean the set of files first? I don't quite understand why we have so many files checked in, including CMakeFiles etc.

mcremon-meta · 2026-05-08T20:21:06Z

@@ -0,0 +1,25 @@
+Collecting matplotlib


not sure what this file is?

mcremon-meta · 2026-05-08T20:21:32Z

      kernel_name: impl::generic::quantized_matmul_asym8uxasym8u_asym8u_out

- func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!)
+- func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last, *, Tensor(a!) out) -> Tensor(a!)


why is this needed?

cad-rlc · 2026-05-15T12:51:25Z

@mcremon-meta few stale files were accidentally committed in this pull request. We are addressing the issue and will submit a new PR shortly.

linux-foundation-easycla · 2026-05-28T13:42:26Z

❌ The email address for the commit (3dd5559, 4491310, 6a46467, 9991681, bd7fb9f, c2a48d2, f1693c2) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.
❌ - login: @cad-rlc / name: cad-rlc. The commit (0cccddf, 93271c8) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

…onal - Add DMA-optimized operators: conv2d (1x1/3x3/7x7), maxpool, quantize/dequantize, relu, add, mean, softmax, linear - Add new operators: embedding, full, im2row, quantized_fully_connected, quantized_layer_norm, quantized_matmul, requantize, view_copy - Add vision/kernels library and quantized_ops.h header - Add config generator for DMA buffer sizing - Update functions_vision.yaml and CMakeLists.txt - Add third-party XAI libraries (libxai, libxai_common, libxa_nnlib) - FACTO submodule update

cad-rlc · 2026-05-31T06:24:31Z

@mcremon-meta
We have addressed the previously reported issues and comments. Kindly review the updated changes.
Vision 130 DSP core configuration used for testing:
Data Cache size: 65536
Data Cache ways: 2
Data Cache line size: 128
Data Cache write-back: Enabled
Number of Data Cache banks: 2
Instruction Cache size: 65536
Instruction Cache ways: 4
Instruction Cache line size: 256
Data RAM 0: 4K / 8K / 16K / 24K / 32K / 64K
Data RAM 1: 4K / 8K / 16K / 24K / 32K / 64K
Instruction RAM: 0
We observed approximately 45× and 55× performance improvements for complete inference of ResNet-18 and ResNet-50, respectively, with optimized operators compared to generic operators, when using memory modeling (--mem_model --mlatency=40 --blockrepeat=1 --write_delay=40 --write_repeat=1) and 64K DRAM0 and DRAM1.

cad-rlc requested review from GregoryComer, JacobSzwejbka, digantdesai, kimishpatel, kirklandsign, larryliu0820, lucylq, manuelcandales and mergennachin as code owners April 24, 2026 15:18

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 24, 2026

mcremon-meta requested changes May 8, 2026

View reviewed changes

Suraj Raut added 7 commits May 28, 2026 07:16

Reset non-Cadence files to upstream/main

c2a48d2

Remove accidental files

9991681

Sync submodule pointers with upstream/main

3dd5559

Sync remaining files with upstream/main

6a46467

Reset backends/cadence/aot/ to upstream (keep functions_vision.yaml)

4491310

Reset submodule pointers to upstream/main

f1693c2

cad-rlc force-pushed the main branch from 589ae7b to f1693c2 Compare May 29, 2026 08:26

cad-rlc requested review from psiddh and robert-kalmar as code owners May 29, 2026 08:26

Merge branch 'pytorch:main' into main

93271c8

Merge branch 'main' into main

0cccddf

cad-rlc requested a review from mcremon-meta June 2, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111
cad-rlc wants to merge 9 commits into
pytorch:mainfrom
cad-rlc:main

cad-rlc commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

cad-rlc commented May 8, 2026

Uh oh!

mcremon-meta left a comment

Uh oh!

mcremon-meta May 8, 2026

Uh oh!

mcremon-meta May 8, 2026

Uh oh!

cad-rlc commented May 15, 2026

Uh oh!

linux-foundation-easycla Bot commented May 28, 2026 •

edited

Loading

Uh oh!

cad-rlc commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cad-rlc commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operators

Conv2d (quantized_conv2d_nchw)

MaxPool2d (maxpool_exec_mxnj2)

Mean / AdaptiveAvgPool (mean_exec_dma)

Quantize / Dequantize (quantize_per_tensor, dequantize_per_tensor)

Quantized ReLU (quantized_relu)

Quantized Linear (quantized_linear_out)

Add (op_add)

Softmax (op_softmax)

Build Configuration

Test Configuration

Performance Results

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111

⚠️ 11 Awaiting Approval

Uh oh!

pytorch-bot Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

This PR needs a release notes: label

Uh oh!

cad-rlc commented May 8, 2026

Uh oh!

mcremon-meta left a comment

Choose a reason for hiding this comment

Uh oh!

mcremon-meta May 8, 2026

Choose a reason for hiding this comment

Uh oh!

mcremon-meta May 8, 2026

Choose a reason for hiding this comment

Uh oh!

cad-rlc commented May 15, 2026

Uh oh!

linux-foundation-easycla Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cad-rlc commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cad-rlc commented Apr 24, 2026 •

edited

Loading

Conv2d (`quantized_conv2d_nchw`)

MaxPool2d (`maxpool_exec_mxnj2`)

Mean / AdaptiveAvgPool (`mean_exec_dma`)

Quantize / Dequantize (`quantize_per_tensor`, `dequantize_per_tensor`)

Quantized ReLU (`quantized_relu`)

Quantized Linear (`quantized_linear_out`)

Add (`op_add`)

Softmax (`op_softmax`)

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

This PR needs a `release notes:` label

linux-foundation-easycla Bot commented May 28, 2026 •

edited

Loading